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1. INTRODUCTION 

The recognition and classification of the Arabic Langue Letter is the interesting subject in the 
applications of Arabic computer interaction. The computer interaction is an important tool in the intelligent 
systems and technologies. The Language recognition is speech recognition, and it is characterized as the way 
toward changing over sound waves (acoustic discourse signals) to its relating set of words or other linguistic 
units [1]. In this context, that recognition is based on a specific algorithm step, where these algorithms are 
based on the feature extraction of the selected subject which is required to recognize it, while the features 
represent the carrier of the speaker essence [2], [3]. Where, these features will be reduced to minimize the 
efforts of digital signal processing applications [4], [5], [6], [7]. 

In this context, the speech signal also carries the information of the particular speaker, including 
social factors, affective factor and the properties of the real voice production [8]. In effect, the speech has the 
potential of being an important mode of interaction with the computer. Speech processing is one of the 
exciting areas of signal processing [9]. The letters of the Arabic language are different from the rest of the 
languages because the letter pronunciation is differed according to their position in the word. Also, their 
pronunciation varies according to the impact of the word in the sentence. As well the letter pronunciation is 
between the Arab countries according to their dialect. 

The recognition process has been based on two phases, training, and testing. Where, the training 
phase work with extracted features by using suitable neural network (NN) algorithm, and then use this 
algorithm in testing phase. There are many NN algorithms types, the Radial Basis Function (RBF) is one of 


Journal homepage: http:/iiaescore.com/journals/index.php/IJECE 


Int J Elec & Comp Eng ISSN: 2088-8708 Oo 403 


the optimal algorithms in a noisy environment. In this context, this algorithm is a linear combination of radial 
basis functions and can use in a function approximation, time series prediction, and control. 
The advancement of this algorithm over the other is the faster convergence, smaller extrapolation errors and 
higher reliability [10]. 

Since the last years, the researchers have been looking into optimal features and ways to recognize 
the Arabic Language letters [11]. In this context and for the importance of the subject there are many types of 
research in this area, some of these; 

In [12] a Hidden Markov Model (HMM) is used as feature extraction algorithm, while the noise 
reduction is made by using power spectral estimator, and the Gabor filter bank is used for the noise 
separation in an acoustic event detection system. However, in [13] was used HMM with Mel frequency 
Cepstral Coefficient (MFCC) features under no noise condition for speech recognition. While in [14] there 
are five speech parameters have been used as features for speech recognition. These parameters are; Relative 
Spectra Processing (RASTA), MFCC, Linear Predictive Coding (LPC) Analysis, Dynamic Time Wrapping 
(DTW) and Zero Crossings with Peak Amplitudes (ZCPA). Where, RASTA and MFCC are Extracted as 
features In addition to being factors, While LPC predicts as features based on previous features. In [15], 
a Cepstral frequency coefficient and perceptual linear prediction have been presented as feature extraction 
methods. While, Rasta filtering and Cepstral mean subtraction has presented as feature normalization 
technique, with a combination of Gaussian mixture models (GMM) and linear/non-linear kernels which is 
based on support vector machine (SVM) as speaker identification. In [16] an FFT is used as gender 
identification, then use back-end system to create a gender model to recognize these genders with an average 
of the accuracy 80%. The use of signal processing technique for speech recognition for a particular language 
is presented in [17], while the feature extraction is based on the adopted algorithms. Also, the comparison 
between these adopted algorithms is presented in [18]. A hybrid of HMM and Radial Basis Function (RBF) 
was presented for continuous speech recognition with 65% recognition rate in [10]. 

The problem lies, in addition to the lack of interest in the research in the Arabic automatic speech 
recognition, the most of the published papers dealing with an HMM algorithm. Where the accuracy of ASR 
using HMM algorithm is affected by several factors; the phoneme set used; the number of HMM states 
allocated for each phoneme and the duration of each phoneme, in addition to the noisy environments, thus 
reducing this accuracy. 

Therefore, this paper gives an overview of speech features extraction and the proposed work which 
is consisting of three steps; preprocessing, feature extraction, classification and finally the comparison with 
other works. This study has been dealt with differently letter position in the word, letter's impact of the word 
in the sentence and letter pronunciation of different Arab country's dialect and. Also, the number of training 
pattern has increased from 10-30 per class with a constant testing pattern for each class with the noisy 
condition for recognition the independent speakers. Also, the proposed work has been based on the Radial 
Basis Function Neural network with statistical features. Then the comparison has been made among the 
previous works which are used HMM and other algorithms. 


2. SPEECH FEATURE EXTRACTION 

The structure of the vocal organs generates a wide variety of waveforms. These waveforms can be 
broadly categorized into voiced and unvoiced speech; this categorization is made after the features 
extraction [9]. In this context, two kinds of algorithms which are used for feature extraction; the first one is 
related to speech processes, while the second is related to the results of these processes. Whilr, the feature 
vectors are equivalent to the vectors of explanatory variables used in statistical procedures such as linear 
regression [19]. Therefore, the features are; 
a. Articulatory features 

Articulatory features (AFs) have attracted interest from the speech recognition community, where, 
these features describe the configuration of the human vocal tract and the properties of speech products. 
The essential thought of this approach is to bear a proclivity to the articulatory occasions fundamental the 
discourse flag. This portrayal is made out of classes depicting a basic articulatory properties of discourse 
sounds, for example, put, way, voicing, lip adjusting, the opening between the lips, and the position of the 
tongue. 
b. Features based on perception system 

The auditory system has been based on the sensory system for the feeling of hearing. The research 
in speech recognition is dealing with the way in which the human can recognize the speech and use the 
speech information to understand the spoken language [20]. In effect, the statistical features can be 
considered as the second kind of features, but it's related to the first kind also. Therefore the statistical feature 
can represent as an active feature in the speech recognition application. 
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3. PROPOSED WORK 

The proposed work in this paper consists of three stages; preprocessing, statistical feature extraction, 
and classification. 
a. Preprocessing 

The preprocessing stage is represented as preparing stage, where it prepares the signal to the feature 
extraction stage. Therefore, this stage consists of five steps; these are; Salience is removed, Normalized, pre- 
emphasis, Framing and windowing, and then take one frame. In this context, the salience removing is done to 
reduce the size of data which need to process and keep the samples which contain the information only. 
The normalization step is a limitation of the sample values which need to process it. Pre-emphasis is a signal 
concentrated step and boosted the energy at the upper band frequency, while the framing is segmented step 
also to reduce the process time and data size. One Arabic letter has been taken as an example in this stage as 
in Figure 1. Where, the sound signal of a sad (os) letter from Arabic alphabet letters has been taken in real 
environments, as in Figure (la-b), then removing the effect of the environment. The preprocessing steps 
results have further strengthened our confidence in the statistical features as the classification tools, where the 
salience is removed, and normalization then framing and select one frame with a window which gives signal 
effect with decreasing the number of process samples as shown in the Figure 1. 
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Figure 1. Preprocessing stage 
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b. Statistical Features 

Statistical features have represented the core of the signals, and It carries the spirit of the signal. 
Where some of these features are; zero crossing rate, signal Energy, temporal centroid, d) energy entropy 
(EE), RMS, spectral flux, Spectral energy, and MFCC. Where these features have been representing the 
suitable features for the sound signal as in Figure 2. After preprocessing step, some clear-cut, effectively 
feature was used instead of all extracted feature to reduce processing time while maintaining the accuracy. 
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Figure 2. Statistical features of the sound signal (for Sad [ o< ] letter) 
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c. Classification 

The classification stage has been based on the Radial Basis Function(RBF) neural network as shown 
in Figure 3. This stage is affected by many factors, one of the most powerful ones is the number of training 
and testing patterns. Therefore, the increasing of the training pattern will cause to undermine the similarity 
between the patterns to appear the difference between these patterns. Many experiments have been done with 
different letter position in the word, letter's impact of the word in the sentence and letter pronunciation of 
different Arab country's dialect. In this context, the number of training pattern has also increased from 10-30 
per class with ten testing pattern for each class. Tables 1, 2, 3, and 4. shown the results of classification of 
different experimental parameters. 


150 5 


Figure 3. Radial Basis Function(RBF) neural network 


Table 1. Recognition of Bee (~ ) Letter with a Different Number of Training Patterns for Different Letter 
position, impact and letter pronunciation of a different Arab country's dialect 


Letters No of training Results % No of training Results % No of training Results % 
© 10 50 20 80 30 90 
zZ 10 0 20 20 30 0 
vu 10 10 20 0 30 0 
A 10 0 20 0 30 0 
ua 10 40 20 0 30 10 


Table 2. Recognition of Bee (= ) Letter with a Different Number of Training Patterns for one Arab Country's 
Dialect with Different Letter Position and Impact 
Letters No of training Results% No of training Results% No of training Results % 


a 10 91.66 20 96.66 30 98.33 
t 10 0 20 3.33 30 0 
vu 10 1.66 20 0 30 0 
J 10 0 20 0 30 0 
ua 10 6.66 20 0 30 1.66 


Table 3. Recognition of Bee (= ) Letter with a Different Number of Training Patterns for One Arab Country's 
Dialect with Different Letter Position 
Letters No of training Results % No of training Results % No of training Results % 


© 10 96.875 20 98.875 30 99.375 
d 10 0 20 1:25 30 0 
us 10 0.625 20 0 30 0 
a 10 0 20 0 30 0 
us 10 25 20 0 30 0.625 


Table 4. Classification by Using the Temporal Radial Basis Function(TRBF) Neural Network 


Letters ue A o Z a 
a 99.375% 0 0 0.625 0 
Z 1.25 98.125% 0.625 0 0 
uw 0 0.625 98.125% 1.25 0 
A 1.875 0 00.625 97.5% 0 
ue 1.25 0 0 0 98.75% 
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The average classification is 95.9% for five letters in experimental which was conducted with 
different parameters of letters situations, while the classification is 98.175% for one Arab country's dialect 
with different letter position. Therefore, the advancement of the proposed algorithm has been proved by the 
comparison between the results (classification) which has been gained with other work as presented in the 
Table 5. 


Table 5. Comparison proposed work with others 


Ref Classification method Average Recognition % 
[12] HMM 74.5% 
[21] TMNN 90.7% 
[22] MLP 96.3% 
Previous work[23] MLFFNN 96.33% 
Present work TRBFNN 98.175% 


4. CONCLUSION 

The recognition which is based on the combination of the statistical features with the Radial Basis 
Function (RBF) as the recognition neural network algorithm are gaining an overperform the other 
combinations by 1.845%. This advancement of that combination is caused by using RBF where in this 
algorithm the hidden function is a Gaussian, while the Euclidean distance is computed from the test point to 
the main center of each neuron. Therefore, the average recognition rate, which is a gain of that combination 
is 98.175%. Also, the parameters which are effected on the Arabic letter classification are letter position in 
the word, letter's impact of the word in the sentence and letter pronunciation of different Arab country's 
dialect. 
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